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ABSTRACT 



We apply statistically rigorous methods of nonparametric risk estimation to the 
problem of inferring the local peculiar velocity field from nearby supernovae (SNIa). 
We use two nonparametric methods - Weighted Least Squares (WLS) and Coefficient 
■ Unbiased (CU) - both of which employ spherical harmonics to model the field and 

use the estimated risk to determine at which multipole to truncate the series. We 
, show that if the data are not drawn from a uniform distribution or if there is power 

I I beyond the maximum multipole in the regression, a bias is introduced on the coefficients 

^ ' using WLS. CU estimates the coefficients without this bias by including the sampling 

-4— > ' 

c/3 ' density making the coefficients more accurate but not necessarily modeling the velocity 

field more accurately. After applying nonparametric risk estimation to SNIa data, 
we find that there are not enough data at this time to measure power beyond the 
I> ■ dipole. The WLS Local Group bulk flow is moving at 538 ± 86 km s~^ towards (/, b) = 

(258° ± 10°, 36° ± 11°) and the CU bulk flow is moving at 446 ± 101 km s'^ towards 
^ ; {l,b) = (273° ± 11°, 46° ± 8°). We find that the magnitude and direction of these 

measurements are in agreement with each other and previous results in the literature. 

m 
o 

Subject headings: 



^ ■ 1. Introduction 



Inhomogeneities in the matter distribution of our universe perturb the motions of individual 
galaxies from the overall smooth Hubble expansion. These motions, called peculiar velocities, 
result from gravitational interactions with the spectrum of fluctuations in the matter-density and 
are therefore a direct probe of the distribution of dark matter. The peculiar velocity of an object 
is influenced by matter on all scales and modeling the peculiar velocity field allows one to probe 
scales larger than the sample. Calculating the dipole moment of the peculiar velocity field, or bulk 
flow, is an example of a measurement which helps us investigate the density fluctuations on large 
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scales. Fluctuations in density on many Mpc scales are well described by linear physics and can be 
used to probe the mass power spectrum while fluctuations on small scales become highly non-linear 
and difficult to model. 

Prom an accurate measurement of the local peculiar velocity field we can infer the properties 
of the dark matter distribution. On scales > 10 Mpc we can use linear perturbation theory to 
estimate the bias free mass power spectrum directly from 



f/(r) 



(1) 



where 5m(r) is the density contrast defi ned by [p — p)/{p), p is the average density, and is the 
matter density parameter (|Peebleslll993l ). In the past, measurements of the matter power spectrum 
using galaxy peculia r velocity catalogs cori s istently produced p ower spectra with large amplitudes 



(jZaroubi et al.l 119971 : iFreudling et al.l Il999l : IZaroubi et al.l l200ll ) . A renewed interest in bulk flow 



measurements has recently produced power spectr a with lower ampli t udes, which is often char- 
acterized by (78 1 which are consistent with WMAP (IPark fc ParkI 120061 : iFeldman &: WatkinsI 120081 : 



Abate k Erdogdull2009l: ISong et al 



cosmology (jKashlinskv et al.ll2008l : IWatkins et al.ll2009l : 



201C 



Lavaux et al 



201011 and some which challenge the ACDM 



Feldman et al.ll20ld : iMacaulav et al.ll20ld ) 



Peculiar velocity measurements also enable one to measure the matter distribution independent 
of redshift surveys. This allows for comparison between the galaxy power spectrum and matter 
power spectrum to probe the bias parameter which specifies how galaxies follow the total under- 
lying matter distribution (jPike &: Hudsonll2005l : iPark &: Parkll2006l : iDavis et al.ll2010al ) . In addition 
to measuring 0, this comparison can also be used to test the validity of the treatment of bias as a 
linear scaling ([Abate et al.l 120081 ). 



By accurately measuring t he local velocity field it i s also possible to limit its effects on de 



Neill et al. 


2007: 


Davis et al. 


2010bl 



comes from comparing an inferred luminosity distance with a measured redshift. This redshift is as- 
sumed to be from cosmic expansion. However, on smaller physical scales where large scale structure 
induces peculiar velocities that create large fluctuations in redshift, this measured redshift becomes 
a combination of cosmic expansion and local bulk motion and thus of limited utility in inferring cos- 
mological parameters from the corresponding luminosity distance. While traditionally this trouble- 
some regi me has been viewed to extend out to z < 0.05, re cent work has shown significant effects out 



to z < 0.1 (ICooray &: Caldwellll2006l : iHui &: Greenell2006l ). Hence peculiar velocities from SNIa add 



scatter to the Hubble diagram in the nearby redshift regime which adds uncertainty to derived cos- 
mological parameters, including the dark energy equation-of-state parameter. In an ongoing effor t 
to probe the nature of dark energy, surveys such as the CfA Supernova Groupl^i Hicken et al. 2009a ). 



' ihttp: //www, cf a. harveird. edu/supernova/ SNgroup.html I 



- 3 - 



SNL^l kstier et alJbooeh. Pan -STA RR#I. ESSENCeFI (iMiknaitis et al.l l2007ll. Carnegie Supernova 
Project (CSP) ^1 dHamuv et~al ''2006': i Folatelli et aPboKIL the Lick Observatory Su pernova Search 



KAIT /LOSS dFi UpDenko ~al.i i2001l: iLeaman et al.1 boiol'l. Nearby SN Factor^§ lAldering et al 



2OO2I ). SkvMappeiFf (|Murphv et al-l bood). Palomar Transient Factorjl^ Law et al.1 bood ) hope to 



obtain tighter constraints on cosmological parameters. With the future influx of SNIa data, statis- 
tical errors will be reduced but an und erstanding of systema tic errors is required to make improve- 
ments on cosmological measurements (jAlbrecht et al.ll2006l ). Averaging over many SNIa reduces 
scatter caused by random motions but not those caused by coherent large scale motions. One 
expects these bulk motions to converge to zero with increasing volume in the rest frame of the 
CMB with the rate of convergence depending on the amplitude of the matter perturbations. This 
fact i notivates dete rmining both the monopole and dipole component of the local peculiar velocity 
field (jZaroubilbooi l. 



To model the local peculiar velocity field or flow field requires a measure of an object's peculiar 
velocity and its position on the sky. The peculiar velocity of an object, such as a galaxy or Type 
la Supernova (SNIa), given the redshift z and cosmological distance d is 

U = Hodi{z) - Hod (2) 

where Hq is the Hubble parameter and HQdi{z) is the recessional velocity described by 

Hodi{z) = c{l + z) [ [nM{l + z'f + ^Ay^^^dz'. (3) 
Jo 

Redshifts to galaxies can be measured accurately with an error ~ 0.001. Therefore, the accuracy 
of a measure of an object's peculiar velocity rests on how well we can determine its distance. 

A variety of techniques exist to det ermine d. Distances to spiral galaxies can be measured 
through the Tully-Fisher (TF) relation (iTullv fc Fished 119771 ) which finds a power law relation- 
ship between the luminosity and rotational velocity. This method has been one of the most suc- 



cessful in generating large peculiar velocity catalogs. The SFI+-I- dataset (IMasters et al.l 12006 



Springob et al.l 120071 ) for example, is one of the largest homogeneously derived peculiar velocity 
catalog using I-band TF distances to ~ 50 00 galaxies with ^ 15 % distance errors. This cata- 



log builds on the Spiral Field I-band (SFI; iGiovanelh et al.l (jlQQd Il995l ): Ida Costa et al.l (jlQed ): 



^http 



//cf ht . hawaii . edu/SWLS/ 1 



//pan-Starrs . if a.hawaii . edu/public/] 



' Ihttp 



//www. ctio .noao . edu/essence/] 



Ihttp 



//cspl . Icq . cl/~ cspuserl/PUB/CSP .htmll 



* ]http 



// snf actory . Ibl . gov/ 1 



' |http 



//www. mso . anu. edu. au/skymapper/| 



^http 



//www, astro . caltech.edu/ptf/] 
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Havnes et all (ll999b|fl) ), Spiral C l uster I -band (SCI; Iciovanelli et al.1 (|l997bl lahl and the Spiral 



Cluster I-band 2 (SC2; bale et alJ (Il999all3'l) catalo gs. The Two Micron All-Sky Survey (2MASS) 



Tully-Fisher (2MTF) survey ([Masters et alJ |2008| ) aims to measure TF distances to all bright 
spirals in 2MASS in the J, H, and K bands. The Kinematics of the Local Universe catalog 



(klunA j Theureaul 1 19981 ). which is the only catalog to exceed SFI-|--|- in number of galaxies, 



consists of B-band TF distances to 6600 galaxies. The velocity widths in this catalog are not 
homogeneously collected and measured adding to the errors. Additionally, the B-band TF re- 
lationship exhibits more scatter - which translates to larger distance errors - than the I, J, 
H, and K bands due to Galactic and internal extinction, making the SFI-I— |- arguably the best 
galaxy peculiar velocity cata log currently available. Finally, with the Square Kilometer Array 



(SKA) (jPewdnev et al.l |2009j) we expect TF catalogs to grow out to larger distances using less 



HI in the near future. 



and shear moments have been made (Giovanelli et al. 1998: Dale et al. ; 


.999b: Courteau et al. 


2000: Kudrva et al. 


2003: Feldman k Watkins 2008: Kudrva et al. 


2009: 


^usser & Davis 


2011) 
1999,: 


and cosmological parameters have been constrained (da Costa et al. 


1998: 


Freudling et al. 


Borgani et al. 2000: 


Branchini et al. 2001: 


Pike Hudson.. 2005.: Masters et al. 2006: Park & Park 


2006: Abate & Erdo&du bood: Davis et al. 


2010a 1. 



Fundamental Plane (FP) distances (jPiorgovski &: Davislll987l ) which express the luminosity of 
an elliptical galaxy as a power law function of its radius and velocity dispersion also enable one to 
generate large peculiar velocity catalogs. Several smaller datasets utilize this distance indicator . 



The Streaming M otions of Abeh Clusters (SMAC) project (ISmith et al.ll200nl : iHudson et al.ll2001 



Smith et al.ll200in is an all-sky FP survey of 69 9 galaxies with ~ 20% distance errors. The EFAR 



project (IWegner et al.l 1 19961 : IColless et al.l l200ll ) studied 736 elliptical galaxies in clusters in two 
regions on the sky to improve distan ce estimates to 85 clusters. Larger FP programs include the 
NOAO Fundamental Plane Survey (ISmith et al.l l2004l ) which will provi de FP measurements to 
^4000 early-type galaxies and the 6 Degree Field Galaxy Survey (6dFGS) (jWakamatsu et al.ll2003l : 
Jones et al.l l2009l ) : a southern sky survey which, in combination with 2MASS, hopes to deliver 
more than 10,000 peculiar velocities using near infrared FP distances. The increase in the number 
of objects makes this catalog competitive even though individual distance errors are greater than 
TF errors. The Dn — a re lation is a reduced parameter version of FP with typical errors of 



Bernardi et al 



25%(lDressler et al.lll987bl '). The Early-type NEARby galaxies (ENEAR) (Ida Costa et al.ll2000bl : 



2OO2I ) project measured galaxy distances based on Djy — a and FP to 1 3 59 and 1107 



galaxies and the Mark HI Catalog of Galaxy Peculiar Velocities (jWillick et al.l 119951 . 119961 . 119971 ) 
contains 3300 galaxies with estimated distances from TF and D„ — cr. Gl obal features of large- 



scale motions (IDressler et al 



Hudson et al 



2001 



200 



4 



1987a 



Colless et al 



catalogs (IDavis et al.l 



Hoffman et al 



199 



2OOII). 



Park 



Hudson et al 



2001 



199S 



Dekel et al. 



199S 



da Costa et al. 



2000a 



and derived parameter s have been measured using these 



2000; iRauzv Hendrvl I2OO0I : IZaroubi et al 



2001 



Nusser et al. 



^http : //klun . obs-nancay . f r/ 
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Other distance measurements to galaxies are more difficult to make or not as precise. iTonrv et al 



(|200ll ) report distances to 300 early type galaxies using Surface Brightness Fluctuations (SBF) 
whose observations were obtained over a period of ~ 10 years. This method measures the luminos- 
ity fluctuations in each pixel of a high signal-to-noise CCD image of a galaxy where the amplitude 
of these fluctuations is inversely proportional to the distance. A ^ 5% distance uncertainty can 
be obtained under the best observing conditions (jTonry et al.ll200d ) making SBF a useful method 



for cz < 4000 km/s. Although it is difficult to create a large catalog of objects, iBlakeslee et al 



19991) used SBF distances t o put constraints on Hq and /3. Tip of the Red Giant Branch (TRGB) 



Madore fc: FreedmanI Il995l ) and Cepheid distances are challenging to obtain as one must have 



resolved stars which limits observations to the local Universe. 

SNIa are ideal candidates to measure peculiar velocities because they have a standardiz- 
able brightness and thus accurate distances can be calculated with less than 7% uncertainty (e.g. 
Jha et al.l 120071 ). Only recently through the efforts like the CfA Supernova Group, LOSS, and CSP 
have there been enough nearby SNIa (~ 400) to make measurements of bulk flows. Measurements 
of the monopole, dipol e, and quadrupol e have been made which find dipole results compatible 



1998 



with the CMB dipole (jCohn et al.l I2OIOI : lHaugb0lle et al.l l2007l : Ijha et al 



2007 



Giovanelli et al 



Riess et al.lll995l ). Measurements of the monopole a s a function of redshift have been used 



to test for a local void (Zehavi et al. 



1998 



Jha et al 



20071 ). SNIa pec uliar velocity measurements 



have also been used t o put co nstraints on powe r spect rum parameters (jRadburn-Smith et al.ll2004l : 



Watkins fc FeldmanI 120071 ) . iHannestad et al.l (j2008l ) forecast the pr ecision with which we wi ll 



be able to probe as with future su r veys like LSST. Following up on ICooray &: Caldwelll (j200d ) ; 
Hui Greend (|2006l ) , I Gordon et al.l (j2008l ) and IPavis et al.l (|2010bl ) investigated the effects of cor- 
related errors when neighboring SNIa peculiar velocities are caused by the same variations in the 
density field. Not accounting for these correlations underestimates the uncertainty as each new 
SNIa measurement is not independent. Recent investigations in using near-infrared measurements 
of SNIa to measure distances have shown promise for better standard candle behavior and the 



potential for more accurate aiid precise distances t o galaxies in the local Universe (jKrisciunas et al, 
2OO4I : IWood-Vasev et ahlbood : iMandel et aPbood ). 



A wide range of methods have been developed to model the local peculiar velocity field. 
Nusser DavisI (|l995l ) present a method for deriving a smooth estimate of the peculiar veloc- 
ity field by minimizing the scatter of a linear inverse Tully-Fisher relation r] where the magnitude 
of each galaxy is corrected by a peculiar velocity. The peculiar velocity field is modeled in terms of a 
set of orthogonal functions and the model parameters are then found by maximizing t he likelihood 



funct ion for me a,suring a set of observ ed ij. This method was applied to the Mark III (jPavis et al 



19961 ) and SFI (jda Costa et al.lll998l ) catalogs. Several other methods have been developed and 
tested on e.g., SFI and Mark III catalogs to estimate the mass power spectrum and compare pecu- 
liar velociti es to galaxy redshift suryeys which utilize or compliment rigorous maximum likelihood 



2000 



techniques (IWillick fc Strauss 



Zaroubi et al 



2001 



1998: 



Freud 



Branchini et al. 



ing et al 



2001 



199S 



Zaroubi et al. 



1999|i 



) . Nonparametric models (jBranchini et al.l 1 19991 ) 



Hoffman fc Zaroubi 
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and orthogonal functions (|Nusser &: Daviall994l : iFisher et al.lll995l ) have been i mplemented to re- 
cover the velocity f ield from galaxies in redshift space. Smoothing methods (jPekel et alj Il999l : 
Hoffman et al.ll200ll ) have also been used to tackle large randoi n errors and systema tic errors as- 
sociated with nonuniform and sparse sampling. More recently, IWatkins et al.l (|2009l ) introduce a 
method for calculating bulk flow moments which are comparable between surveys by weighting the 
velocities to give an estimate of the bulk flow of an ideal ized survey. The varian ce of the difference 
between the estimate and the actual flow is minimized. iNusser David (j201ll ) present ACE (All 
Space Constrained Estimate), a three dimensional peculiar velocity field constrained to match TF 
measurements which is used to reconstruct the bulk flow. 

Comparisons between different SNIa and galaxy surveys and metho ds show that measure 



ment s of the local velocity field are highly correlated and in agreement (IZaroubi 



2003: 



Hudson et al.l 12004; Radburn-Smith et al. 



2004 



Pike Hudson 



200. 



i 



2002: 



Hudson 



Sarkar et al.l 



2007 



Watkins &: Feldman 20071). Howeve r, pe culiar velocity datasets w hich have recen tly been com- 



bined (namely iFeldman et al.l ()2010), but 



Watkins et al 



( 20091 ) and lMa et al.l (|20inl ) also combine 



datasets) are in disagreement with iNusser &: David (120111 ) . Errors in distance measurements and 
the non-uniform sampling of objects across the sky due to the Galactic disk (~40% of the sky) ag- 
gravate the systematic errors (|Zaroubill2002l ). These systematic errors cause inconsistencies among 
the different models and must be dealt with in a statistically sound fashion. Since the field is at a 
point where rough agreement exists between the different methods, and modeling can be done with 
better precision as the amount of data continues to increase, it is time to investigate the systematic 
errors that limit our measurements. 

In this paper we present a statistical framework that can be used to properly extract the avail- 
able flow fleld from observations of nearby SNIa, while avoiding the historical pitfalls of incomplete 
sampling and over-interpretation of the data. In particular, we emphasize the distinction between 
finding a best overall model fit to the data and finding the best unbiased value of a particular 
coefficient or set of coefficients of the model, e.g., the direction and strength of a dipole term due to 
our local motion. The first task is to provide a framework for modeling the peculiar velocity field 
which adequately accounts for sampling bias due to survey sky coverage, galactic foregrounds, etc. 
These methods are discussed in ^ We then introduce risk estimation as a means of determining 
where to truncate a series of basis functions when modeling the local peculiar velocity field, e.g., 
should we fit a function out to a quadrupole term. Risk estimation is a way of evaluating the 
quality of an estimate of the peculiar velocity field as a function of / moment, whose minimum 
determines the optimal balance of the bias and variance. These methods are outlined in ^ In ^ 
and ^we apply these methods to a simulated dataset and SNIa data pulled from recent literature 
and discuss our results. We then apply our methods to simulated data modeled after the actual 
data and examine their performance as we alter the direction of the dipole in ^ In ^ we conclude 
and present suggestions for future work. 
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2. Nonparametric Analysis of a Scalar Field 

A peculiar velocity field at a given redshift can be written as 

Un = f{Xn) + en (4) 

where C/„ is the observed peculiar velocity at position x„ = {On, (t^n) on the sky, e„ is the observation 
error, and / is our peculiar velocity field. Since we expect / to be a smoothly varying function 
across the sky, it can be decomposed as 

oo 

f{x) = Y,P3H^) (5) 

j=0 

where [j = 0, 1, 2...] forms an orthonormal basis and /3j is given by 



/3i = j ^jix)fix)dx. (6) 
In this work we apply the real spherical harmonic basis as we are physically in terested in a 



measurement of the dipole and follow a procedure similar to lHaugb0lle et al.l (|2007l ) . The radial 
velocity field on a spherical shell of a given redshift can be expanded using spherical harmonics: 

oo I 



/ = ^ ^ aimYlm (7) 
/=0 ■m=-l 

oo r I ^ 

= S [ai-mYl-m + aimYlm) + 0/0^/0 } ■ (8) 

/=0 lm=l J 

Using a/^_m = {~^)"^^tm ^i-m. = (~l)'"^«m expansion for the real radial velocity can be 
rewritten as 



oo r Z ^ 
1=0 lm=l J 



(9) 



oo f Z 



= 5Z 1 5^ [25i(a/™)5ft(l^„) - 29(ai„)9(y«„)] + a^llo \ ■ (10) 

1=0 lm=l J 

Our real orthonormal basis is then [Yiq, ^/2R(Yim), —V^'^{Yim),m = 1,...,^]. 

We have peculiar velocity measurements for a finite number of positions on the sky and there- 
fore cannot fit an infinite set of smooth functions. We estimate /0 by 

J 

f{x) = J2f^jM^) (11) 

j=0 



^Following statistical practice we denote an estimated quantity with a hat. 
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where J is a tuning parameter, more precisely the I moment that we fit out to. By introducing 
a tuning parameter our methods are nonparametric; we do not a priori decide where to truncate 
the series of spherical harmonics but allow the data to determine the tuning parameter. Our task 
is now to estimate /3 and determine J. 



2.1. Weighted Least Squares Estimator /3j 



Consider an ideal case where the 2D peculiar velocity field can be represented exactly as a 
finite sum of spherical harmonics, plenty of uniformly distrib uted data is sample d, and the true J 
is chosen as a result. The Gauss-Markov theorem (see, e.g., iHastie et al.ll2009l ) tells us that the 
best linear unbiased estimator with minimum variance for a linear model in which the errors have 
expectation zero and are uncorrelated is the weighted least squares (WLS) estimator. If we define 
Yj as the N x J matrix 

4'o{xi) (piixi) ■■■ (t>j{xi) 

4>0{X2) <Pl{x2) ••• (t>j{x2) 



(12) 



(t>o{xN) 4>i{xn) ••• (I>j{xn) 

and the column vectors /3 j = (/3o, /3j), U = {Ui, ...,U]\f) and e = {ei,...,ej\f) we can then write 

U = Yj(3j + e. (13) 

The WLS estimator, f3j, that minimizes the residual sums of squares is 

h = {yJwYj)-^yJwU (14) 

where the diagonal elements of W are equal to one over the variance and the off-diagonal elements 
are zero. 

Any estimate for / which truncates an infinite series of functions will produce the same overall 
bias on the peculiar velocity field, namely 

oo 

j=J+l 

where "( )" denotes the ensemble expectation value. However, in the case we are considering there 
is no power at multipoles beyond j = J so this bias will go to zero. 

The estimates of the coefficients f3j are also unbiased if the correct tuning parameter is chosen. 
If Yoo and /? oo are defined over the range [J + l,oo) then the bias on /3j (Appendix I A . T]) 

/3j - ih) = {{YfWYj)-^YfW{Y^^^)) (16) 

is a function of all the /3's beyond the tuning parameter, i.e., the lower-order modes are contaminated 
by power at higher Vs. For this case there is no power at multipoles beyond j = J, /3oo = and 
our coefficients are unbiased. 
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2.2. Coefficient Unbiased Estimator (3* 

If our data are not well-sampled, i.e., not drawn from a miiform distribution, or if spherical 
harmonics are not a good representation of the true velocity field then it is possible that there will 
be power beyond the best tuning parameter. This does not indicate a failure in determining the 
tuning parameter via risk (see ^ but is a consequence of the data. 

Ideally we would like to obtain the unbiased coefficients because we tie physical meaning to 
the monopole and dipole. Suppose our dataset x is sampled according to a sampling density h{x) 
which quantifies how likely one is to sample a point at a given position on the sky, then 

'U^,{x)\ _ /(f(x) + e)cP,{x)\ ^^^^ 

(18) 



h{x) I \ h{x) 

f{x)ct)j{x) 



h{x) 

h{x)dx (19) 



f{x)(l)j{x) 



h{x) 

= (20) 
A weighted unbiased estimate of /?j is therefore (see Appendix IA.2P 



N 



Un(f>j{Xn) 



= ^ (21) 



y- 



n=l " 

where (T„ is the uncertainty on the peculiar velocity. We will call this our coefficient-unbiased (CU) 
estimate, 13*. 

Although the CU estimate is unbiased, its accuracy depends on the sampling density. The 
sampling density in most cases is unknown and must be estimated from the data. 



2.2.1. Estimating h{x) 

The sampling density is a normalized scalar field which can be modeled several ways. We 
outline the process using orthonormal basis functions and will continue to use the real spherical 
harmonic basis, (j). We decompose h as 

oo I 

h{x) = o^^<Pi{x) - ^i'^i(^) (22) 

where / is the tuning parameter. We estimate Oj by 

1 ^ 

^^ = ]^E'^^'(^") (23) 

n=l 
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since 

{Zi) = / (i)i{x)h{x)dx = m. (24) 

It is difficult to create a normalized positive scalar field using a truncated set of orthogonal 
functions. In practice there may be patches on the sky which have negative h. Since a negative 
sampling density has no physical meaning, we set all negative regions to zero, add a small constant 
offset component to the sampling density and renormalize. This will prevent division by zero when 
a data point lies in a negative h region. This procedure adds a small bias but is a standard practice 



when using orthogonal functions and small datasets (see, e.g.. lEfromovichlll999l ). 



Is a spherical harmonic decomposition of the sampling density appropriate? While using an 
orthogonal basis is desirable, the choice of spherical harmonics to model a patchy sampling density 
is clearly non-ideal. Smoothing the data with a Gaussian kernel or using a wavelet decomposition 
to model h would be a good alternative, especially when the distribution of data is sparse or if there 
are large empty regions of space. We merely present the formalism to determine h with orthonormal 
functions and encourage astronomers to use sampling density estimation with any basis set. 



3. Determining Tuning Parameter via Risk Estimation 

Recall that the tuning parameter determines at which / moment to truncate the series of 
spherical harmonics. We determine this value by minimizing the estimated risk. The risk is a 
way of evaluating the quality of a nonparametric estimator by balancing the bias and variance 
(Appendix IA.3|1 which determines the complexity of the function we fit to the data. If the bias is 
large and the variance is small, the function will be too simple, under-fitting the data. This would 
be analogous to only using a monopole term when there is power at higher /. If the opposite is 
true, the data are over-fitted, similar to fitting many spherical harmonics in order to describe noisy 
data. 



3.1. Risk Estimation for WLS 



Recall the estimated peculiar velocity field 

f = YjPj = LU (25) 
where we introduce the smoothing matrix L 

L = Yj{YjWYj)-'YjW. (26) 
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Note that the n*^ row of the smoothing matrix is the effective kernel for estimating /(x„). The 
risk is the integrated mean squared error 



RiJ) = (^^flyM-fixn)fj (27) 

and can be estimated by the leave-one-out cross-validation score 

1 ^ 

RiJ) = j;^Y.(^^-f(-n)Mf (28) 

n=l 



where /(-n) is the estimated function obtained by leaving out the n data point (see, e.g.. lWasserman 



200 



^). For a linear smoother in which / can be written as a linear sum of functions, the estimated 



risk can be written in a less computationally expensive form 

N 



where Lnn are the diagonal elements of the smoothing matrix. 

There are a few important things to note. First, the risk gives the best tuning parameter to 
use in order to model the entire function, e.g., the peculiar velocity field over the entire sky for a 
given set of data. This is different than claiming the most accurate component of the field, e.g., the 
best measurement of the dipole. Secondly, the accuracy to which Eq. [29] estimates the risk depends 
on the number of data points used and will be better estimated with larger datasets. Finally, the 
value of the estimated risk changes for different datasets. What is important for comparison are the 
relative values of the risk for different tuning parameters. Although not explored in this paper, one 
can also use the estimated risk to compare bases with which one could model the peculiar velocity 
field. 



3.2. Risk Estimation for CU 

We start by calculating the variance and bias on h. The variance on h is the estimated variance 
on the coefficients Zi given by 

1 ^ 

^^ = ^T.(^^M-Z^f. (30) 



iV2 

n=l 



The bias on h by definition is 



oo 



i=I+l 
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We can only calculate the bias out to the maximum number of independent basis functions, L, less 
than the number of data points. For spherical harmonics, this is given by 



"^21 + 1 < N. (32) 



1=0 

The risk of the estimator is then the variance plus the bias squared 

/ L 

R{I) = Y^al+ {Zl-ah+ (33) 

1=0 i=I+l 

where we have used Eq. [30] to replace the bias squared of with Zf — af and + denotes only the 
positive values. 

Estimating the risk for CU is similar to WLS using Eq. [23 /(xj) must now be calculated 
with the unbiased coefficients /3| and the diagonal elements of the smoothing matrix L^n (see 
Appendix IA.4P are 

7=0 ^{^n)<^n 

Lnn = ^ . (34) 



y- 

n=l 



4. Application to Simulated Data 

To compare WLS and CU we created a simulated dataset with a non-uniform distribution and 
a known 2D peculiar velocity field. We discuss how the dataset is created followed by an application 
of each regression method and a discussion comparing the methods. 



4.1. Simulated Data 

We built the dataset with a non-uniform h using rejection sampling. To do this we start with 
a uniform distribution of points over the entire sky and evaluate a non-uniform sampling density 
at each point according to 

h = iVy20+ (35) 

where is a normalization factor and + indicates only positive values. This has the effect of 
"masking" a region of the sky. We then choose the 1000 most likely points given some random 
"accept" parameter. If the accept value is less then the sampling density value, that point is 
selected. A typical distribution of data points is shown in Figure [H This pathological sampling 
density provides a useful demonstration of the methods. 
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The simulated real 2D peculiar velocity field is described by 

V = isoyoo - 642yio - ioooy2o + K(-38yii + i06iy22 + isoyge + aooyre) 

-9(1146yii + 849y2i + TOrygs) (36) 

and is shown in Figure [2j We assign an error to each data point of 350 km s~^ and Gaussian 
scatter the peculiar velocity appropriately. This error includes the error on the measurement of the 
magnitude, o"^, the redshif t error, g^, and a thermal component of = 300 km s~^ attributed to 
local motions of the SNIa (jjha et alJboOTI ). 




Fig. 1. — 2D distribution of 1000 simulated data points. All plots are in galactic coordinates. 
The distribution was created from the sampling density h = A^y2o+ where A'^ is a normalization 
factor and + indicates positive values. All negative values in the sampling density are set to zero. 
Although this h is not physical, the large empty galactic plane is ideal for testing the methods. 



4.2. Recovered Peculiar Velocity Field from WLS 

To model the peculiar velocity field with non-parametric WLS methods we first determine the 
tuning parameter from the estimated risk. The risk is plotted in Figure [3] as the solid black line. 
In all estimated risk curves we determine the minimum by adding the error to the minimum risk 
and choosing the left-most I less than this value, i.e., we choose the simplest model within the 
errors. The minimum is at / = 6; as there is power beyond this multipole (see Eq. [36]) . we know 
the coefficient estimator will be biased. 

The results from WLS are in the left column in Figure [H The effects of the bias are clearly 
evident. Artifacts appear in the galactic plane where we are not constrained by any data and are 
not accounted for by the standard deviation; it is not wide enough or deep enough. To determine 
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Fig. 2. — 2D simulated peculiar velocity field in km s ^, described by Eq. [36l 

if the power in the galactic plane is a consequence of a specific dataset, we perform 100 different 
realizations of the data. If the artifacts are a function of a specific dataset, we would expect after 
doing many realizations that the combined results, plotted on the right side in Figure HI would 
recover the true velocity field or that the standard deviation would be large enough to account for 
any discrepancies. The plots in the right column demonstrate that this is not merely a result of 
one realization of the data, but a result of the underlying sampling density and power beyond the 
tuning parameter. 

For comparison, we force the tuning parameter to be ^ = 8 and perform the same analysis in 
Figure [5j We confirm that there is no bias, even if the sampling density is non-uniform. WLS now 
recovers the entire velocity field well and has a standard deviation large enough to account for any 
power fit in the galactic plane. By combining many realizations (right) we see the anomalous power 
in the galactic plane average out, doing a remarkable job of recovering the true velocity field. 

4.3. Recovered Peculiar Velocity Field from CU 

Removing the bias on the coefficients requires reconstructing the sampling density from the 
data. The estimated risk for the sampling density is shown in Figure [6] with a minimum at / = 4. 
Using this tuning parameter we reconstruct the sampling density according to ^2.2.11 A contour plot 
of the sampling density is plotted in the top of Figure[7]with the data points overlaid as black circles. 
To investigate how well h is estimated, we combine the sampling density from 100 realizations of the 
data and calculate the mean (middle) and standard deviation (bottom) in Figure [71 The standard 
deviation is about a factor of 20 smaller than the sampling density and so the sampling density is 
well recovered using 1000 data points. 
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2 4 6 8 10 
Multipole (1) 

Fig. 3. — Estimated risk for WLS (black-solid) and CU (red-dashed). The estimated risk for a 
single I is the median value from a distribution of 1000 bootstraps. As I increases, the distribution 
becomes skewed and the estimated risk becomes unstable. We choose the median to be robust 
against outliers. This is crucial for CU as choosing many points with small sampling density in 
the bootstrap can make the risk very large. The error on the estimated risk is the interquartile 
range (IQR) divided by 1.35 such that at low / when the distribution is normal, the IQR reduces 
to the standard deviation. To determine the minimum / in all estimated risk curves we choose 
the simplest model by finding the minimum, adding the error to the minimum, and choosing the 
left-most / less than this value. The minima occur at / = 6 for WLS and / = 8 for CU. There is 
power beyond the tuning parameter for WLS and so there is a bias on the coefficients. However, 
the minimum risk is lower for WLS than CU indicating that our estimate of f{x) is more accurate 
using WLS. 

Having found the sampling density, we estimate the risk for CU as we did for WLS. These 
results are shown in Figure [3] (red-dashed) with a minimum at / = 8. Note that the estimated risk 
at Z = 6 for WLS is lower than at / = 8 for CU. From this we expect WLS to be more accurate 
modeling f{x) where we have data even though there is a bias on the coefficients. 

The results for CU using J = 8 are plotted in Figure El CU does not allow power to be fit in 
regions where there are few data points by accounting for the underlying distribution of the data. 
The standard deviation also accounts for most of the discrepancies seen in the residual plot. We 
also find this method to be robust against the choice of tuning parameter. In Figure [9] we force 
the tuning parameter to be J = 6 and still do not see any artifacts in the galactic plane, although 
we have sacrificed some in overall accuracy. This is expected since the estimated risk is larger at 
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J = 6. 

In Figure [TOl we show distributions of the residuals for each method using the tuning parameters 
•AvLS = 6 and Jqu = 8. On the top row are the residuals defined as the difference between the 
velocity obtained from the regression f{x) and the velocity V given by Eq. [361 These plots tell us 
how well the method is recovering the true underlying velocity field. On the bottom, the residuals 
are the difference between f{x) and the velocity scattered values Vgcat- These plots tell us how well 
the method is fitting simulated data. The narrower spread in WLS in the top plots tell us it is 
estimating the velocity field more accurately where we have sampled. We expect this result because 
the risk for WLS is lower than for CU. Both models are fitting the simulated data similarly and 
have comparable spreads in their distribution (bottom plots). 
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WLS J=6 




Standard Dev. 500 - 




Fig. 4. — Recovered velocity field (top), standard deviation (middle), and residuals (bottom) in 
km s~^ for WLS for one realization of the data (left) and the combined results of 100 realizations 
of the data (right). The left plots were generated by bootstrapping a single dataset 1000 times 
using the tuning parameter J = 6. We calculate the velocity for a set of 10,000 points distributed 
across the sky based on the derived aim coefficients for each bootstrap. These were averaged to 
create a contour plot of the peculiar velocity field (top) and standard deviation (middle). Finally, 
to create the residual plot, we took the difference between the averaged peculiar velocity at each 
point and the peculiar velocity calculated from Eq. [36l We perform the same analysis but combine 
the results of 100 realizations of the data on the right. It is clear that the power in the galactic 
plane is not merely a function of one dataset but a result of power beyond the tuning parameter 
and the underlying sampling density. 
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WLS J=8 




Standard Dev. 500 - 




Fig. 5. — Results for WLS forcing the tuning parameter to be J = 8 created in an identical manner 
to the plots presented in Figure HI We see that for one realization of the data the standard deviation 
is sufficient to account for all of the power in the galactic plane which is not real. The combined 
results from 100 realizations of the data (right) show the artifacts in the galactic plane do go away, 
confirming that they are due to the bias on the coefficients as a result of power beyond the tuning 
parameter in Figure HI 
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Fig. 6. — Estimated risk for the sampling density with a minimum at / = 4. It is difficult to estimate 
the mean and standard deviation of the risk for the sampling density via bootstrapping because 
duplicates and removing points will change the inherent distribution of the data. We therefore 
must use the entire dataset to estimate the risk. This is in contrast to Fig. [3l where we take the 
median of 1000 bootstrap resamples. The errors are estimated by dividing the data into two equal 
subsets, using one to calculate and the other to calculate the estimated risk. This is done 500 
times. The estimated errors are then the standard deviation at each I scaled by 1/2 due to the 
decrease in the number of points used to estimate the risk. 
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Fig. 7. — Typical recovered sampling density for one realization of the data using the tuning 
parameter 1 = 4 (top). Over-plotted are the simulated data points. To ensure a positive definite 
sampling density, we calculate h according to Eq. [221 set all negative values to zero, add a small 
constant to the entire field, and then renormalize. The mean (middle) and standard deviation 
(bottom) of the sampling density are plotted for 100 different realizations. For each realization, the 
sampling density was calculated using the best tuning parameter for that dataset. Over-plotted 
are the simulated data points from one realization. The standard deviation is roughly factor of 20 
lower and so h is well estimated using 1000 data points. 
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Fig. 8. — Recovered velocity field (top), standard deviation (middle), and residuals (bottom) in 
km s~^ for CU for one realization of the data (left) and the combined results of 100 realizations of 
the data (right). These were generated in an identical manner as those in Figure [H By weighting 
by the sampling density, CU does not allow for any power in the galactic plane where there are no 
constraining data. 
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Fig. 9. — Results for CU forcing the tuning parameter to be J = 6. These were generated in an 
identical manner as those in Figure [H The CU method is more robust to our choice of tuning 
parameter. There is power beyond / = 6 but it is not biasing our coefficients as it did for WLS 

(Fig. ED. 
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WLS CU 




Fig. 10. — Distributions of the residuals for each method using the tuning parameters JwLS = 6 
and Jcu = 8. On the top row are the residuals calculated from the difference between the velocity 
obtained in the regression f{x), and the velocity V given by Eq. [36l On the bottom, the residuals 
are the difference between /(x) and the velocity scattered values V^cat- The bottom plots show 
us how well our methods are fitting the data. The top plots show us how well we are recovering 
the true underlying peculiar velocity field where we have data. We see that WLS models the true 
velocity field better as evidenced by the narrower spread in the distribution but that both methods 
fit the "data" equally well. 
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5. Application to Observed SNIa data 

With our framework established and tested, we now analyze real SNIa data. We first introduce 
the dataset and then apply each regression method and present a comparison of the two methods. 



5.1. SNIa Data 



Our data consi st of SNIa pubhshed in lHicken et al.l (j2009al ) (hereafter HOQa): |jha et al.l (120071') 



Hicken et al 



Hamuv et al.l (|l996l ) ; iRiess et al.l (|1999l ) using the distance measurements published in 
(|2009bl ) (hereafter II09b). Not all of the SNIa published in II09a have distance measurements 
published in H09b. The rest were obtained from private communication with the author . We use 
their results from the Multicolor Light Curve Shape method (MLCS2k2) (IJha et al.l 120071 ) with an 
Ry = 1.7 extinction law. The positions of the SNIa are publicly availablel^^l II09b provide the 
redshift and distance modulus /x with an assume d absolute magn itude of My = —19.504. The 
peculiar velocity, U, is calculated according to (see lJha et al.ll2007l ) 



U = Hodi{z) - Fqc^sn 



where Hodi{z) and HodsN are given by 



Hodi{z) = cil + z) I [nM{l + z'f + nx] ^'^dz'. 
Jo 



Hod 



«SN 



65 



;^q0.2(m-25) 



(37) 



(38) 



(39) 



Here z is the redshift in the rest frame of the Local Group I and we assume that Q-m = 0.3, 
Q\ = 0.7, and Hq = 65 km s~^ Mpc~^. Our results are independent of the value we choose for Hq 
as there is a degeneracy between Hq and My. The error on the peculiar velocity is the quadrature 
sum of the error on /i, a recommended error of 0.078 mag (see II09b), cJx, and a peculiar velocity 
error of o"„ = 300 km s~^ attributed to local motions of the SNIa which are on scales smaller than 



those probed in this analysis (jjha et al.l 120071 ). 



We eliminate objects which could not be fit by MLCS2k2, whose first observation occurs 
more than 20 days past maximum B-band light, or which showed evidence for excessive host galaxy 
extinction [Ay < 2). We choose one redshift shell for our analys is due to the rela tively small number 
of objects and consider the same velocity range adopted by Ijha et all l2007l of 1500 km s'^ < 
HodsN < 7500 km s~^. One object, SN 2004ap, has a particularly large peculiar velocity of 
2864 km s^^. Further examination reveals that this supernova, when modeled with MLCS2k2 with 
Ry = 3.1, has its first observation at 20 days past maximum B-band light. To be conservative, we 



^ ^http: //www, cf a.harveird. edu/iau/lists/Supernovae .html| 
^ ^http: //nedwww. ipac . caltech. edu/help/velc_help .html| 
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exclude this object. This leaves us with 112 SNIa whose peculiar velocity information is recorded 
in Table [5TT1 In this table we include all SNIa with HqcIsn < 7500 km s~^ for completeness. 
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Table 1. SNIa Data 
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Table 1 — Continued 
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08 
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03 
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U 
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16 
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01 
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66 
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U 
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09 


21:52 


07 
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U 
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0.022 






1999cl 


12 
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01 
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30 
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U 
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0.057 


0.045 


-410 


475 


1999cw 


00 


20:01 


46 


-06:20:03.6 
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QO 

oz 


(DO 


U 




0.330 


0.076 


1599 


322 


1999da 


17 


35:22 


96 


60:48:49.3 





013 





001 


66 


not; 


U 


Uo i 


0.066 


0.049 


136 


488 


1999dk 


01 


31:26 


92 


14:17:05.7 





014 





001 


Q A 


ioi 


U 


U / D 


0.252 


0.058 


278 


503 


1999dq 


02 


33:59 


68 


20:58:30.4 





014 





001 


66 


(yjo 


U 


Uoz 


0.299 


0.051 


893 


411 


1999ee 


22 


16:10 


00 


-36:50:39.7 





Oil 





001 


66 


0/1 


n 
U 


Uoo 


0.643 


0.041 


130 


476 


1999ek 


05 


36:31 


60 


16:38:17.8 





018 
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Q /I 


o /y 


U 


izo 


0.312 


0.156 


406 


516 


1999gd 


08 


38:24 


61 


25:45:33.1 
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Q /I 
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U 


1 flO 

iUz 
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0.066 


-872 
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2000ca 


13 
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98 
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OO 
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fl 

u 


U ( 1 


0.017 


0.015 
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2000cn 


17 


57:40 


42 
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OO 


Uo / 


U 


UoO 
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2000cx 


01 
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15 
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61 
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fl 
U 


Uo/ 


0.006 
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446 


444 


2000dk 


01 
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52 


32:24:23.2 
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OA 
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fl 

u 
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Uo4 
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20 
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77 
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u 
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33 
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10 
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34 
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490 
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50 
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482 





089 
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0.035 


08 
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57:24 


93 


25 
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34 


047 
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349 
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2002bo 
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18:06 


51 


21 


49:41.7 
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32 
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077 


0.908 


0.050 


-579 
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2002cd 
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23:34 


42 


58 


20:47.4 
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33 
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1.026 
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04 
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2002cr 


14 


06:37 


59 


-05:26:21.9 
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33 


458 





085 
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-465 
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2002dj 
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34 


-19:31:08.7 
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33 
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2002do 
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56:12 


88 


40:26:10.8 
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34 


340 





no 


0.034 


0.034 


336 


539 


2002dp 
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28:30 


12 
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33 
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0.090 


449 


490 
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88 


07:59:44.8 
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083 
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99 
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2002fk 
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22:05 
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32 
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0.034 


0.023 


50 


452 


2002ha 


20 


47:18 
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00:18:45.6 
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34 


013 
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0.032 
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490 
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19:58 


83 
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35 
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0.031 


0.026 
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06:49 


06 


08:37:48.5 
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001 


34 
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0.605 


0.099 


754 


497 
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Table 1 — Continued 



Name 




RA 
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Av 
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02 


51 
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In Figure [TT] we plot the distribution of the data. One can clearly see the dipole with signifi- 
cant concentrations of negative peculiar velocities around {l,b) = (260°, 40°) and positive peculiar 
velocities around (/, b) = (100°, —40°). As we will explore in future figures (e.g. fig. [T5|) this dipole 
structure is consistent with the dipole in the temperature anisotropy of the Cosmic Microwave 
Background. 




Fig. 11. — Sky distribution of 112 SNIa taken from lHicken et al.l (j2009al ) in Galactic longitude and 
latitude. The velocity range considered is 1500 km s~^ < HqcIsn < 7500 km s^^ where dsN is the 
luminosity distance. The color/shape of the points indicates positive (red-triangle) and negative 
(blue-square) peculiar velocities and the size corresponds to the magnitude of the peculiar velocity. 
From this figure one can clearly see a dipole signature between the upper-left and lower-right 
quadrants. 



5.2. WLS and CU Regressions on SNIa Data 

The first step in estimating the field with CU is to calculate the sampling density. We follow 
the procedure described in ^4.31 and plot the results in Figure [T21 Choosing the simplest model 
gives us a tuning parameter of / = 6. The high / moment is necessary to describe the patchiness of 
the data distribution. The sampling density field is shown in Figure [T3l While it should be unlikely 
that we sample many data points in regions with low sampling density, there are some regions of 
the sky where the sampling density is very low and we have a data point. This discrepancy, in 
combination with a relatively flat estimated risk function is an indication that there are likely better 
basis functions than spherical harmonics to use to estimate h. However, as discussed in ^2.2.1l thev 
will serve for the purposes of demonstrating our method. 
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Fig. 12. — Estimated risk for sampling density with a minimum at / = 6 generated in a similar 
fashion to Fig. [6l The high I moment is an indication of the lumpiness in our sampling density. 



The estimated risk for CU and WLS are plotted in Figure [TH We find the tuning parameter to 
be J = 1 for both methods and the risk values to be very similar. From this we expect that the two 
methods will be consistent and recover the velocity field with similar accuracy. The current SNIa 
data are insufficient to detect power beyond the dipole. Using this tuning parameter we calculate 
the aim coefficients and the monopole and dipole terms from the following equations 



Monopole 



Dipole 




-arctan 



+ 3f?(aii 
9(aii) 



arccos 



)2 + 9(aii)2 



Va^o + K(aii)2 + 9(aii)2 



(40) 

(41) 
(42) 

(43) 



These results are summarized in Table [2j 



The velocity fields from WLS and CU are plotted in Figure [T5j The magnitudes are comparable 
between the two methods, with WLS being slightly larger. The direction of the CU dipole points 
more toward a region of space which is well sampled. The WLS dipole is pulled toward a less 
sampled region which may be why the bulk flow measurement is larger in magnitude. This is 
explored more in ^ 
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Table 1 — Continued 



Name 


RA 












Av 


0"Av 


U 






h m s 


O / // 






mag 


mag 


mag 


mag 


km 


km s-^ 


2007ca 


13:31:05.81 


-15:06:06.6 


0.015 


0.001 


34.622 


0.096 


0.580 


0.069 


-1337 


599 


2007ci 


11:45:45.85 


19:46:13.9 


0.019 


0.001 


34.290 


0.090 


0.074 


0.063 


690 


434 


2007cq 


22:14:40.43 


05:04:48.9 


0.025 


0.001 


35.085 


0.101 


0.109 


0.059 


1399 


558 


2007s 


10:00:31.26 


04:24:26.2 


0.014 


0.001 


34.222 


0.074 


0.833 


0.054 


-942 


523 


2008bf 


12:04:02.90 


20:14:42.6 


0.025 


0.001 


35.174 


0.078 


0.102 


0.049 


271 


535 


2008L 


03:17:16.65 


41:22:57.6 


0.019 


0.001 


34.392 


0.193 


0.036 


0.033 


1117 


602 



''SNIa RA and Dec [J2000] from http: //www, cf a .harvard . edu/iau/lists/Supernovae .html | 

''Redshift in the rest frame of the Cosmic Microwave Background 

^Includes the error on ^, a recommended error of 0.078 mag (see H09), o^, and a peculiar velocity error of 
(Ti, — 300 km s~^ due to local motions on scales smaller than those probed by this analysis. 



Table 2. Summary of Results 





WLS 




CU 


Monopole 


149 ± 52 km s"^ 


Monopole 


98 ± 45 km s"^ 


Dipole 


538 ± 86 km s"^ 


Dipole 


446 ± 101 km s"^ 


Galactic / 


258° ± 10° 


Galactic I 


273° ± 11° 


Galactic b 


36° ± 11° 


Galactic h 


46° ± 8° 
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Fig. 13. — Recovered sampling density using I = 6 as the tuning parameter. After calculating h 
according to Eq. [22} the negative values were set to zero, a small constant of 0.05 was added, and 
the sampling density was renormalized. Data points residing in low sampling density regions may 
be an indication that spherical harmonics are not the best way to decompose h. Same as Fig. 15.11 

To compare the CU and WLS bulk motions, we use a paired t-test. Because the coefficients 
were determined from the same set of data there is covariance between the parameters estimated 
from the two methods. Consider bootstrapping the data N times. For a single bootstrap let X be 
a coefficient from CU and Y be the same coefficient but derived from WLS. The paired t-statistic 
comes from the distribution X — Y 

{X -Y) 

t = — — (44) 

where (Tx-y is the standard deviation of the X — Y distribution and (X — Y) is the mean. According 
to the central limit theorem, for large samples many test statistics are approximately normally 
distributed. For normally distributed data, t < 1.96 indicates that the values being compared are 
in 95% agreement. Performing a paired t-test on the measurements finds that the coefficients are 
in 95% agreement with the results summarized in Table [3l Because the risk values are so similar 
(Figure [HI) , we expect the methods to model the peculiar velocity field equally well and therefore 
expect the values to be statistically consistent. 

To compare two independent measurements we perform a two-sample t-test, which gives us a 
statistical measure of how significant the difference between two numbers are. We first calculate 
the standardized test statistic t = (xi — X'i)l \J a\ + Cg, where xi and X2 are the mean values of 



- 33 - 



O 



• 1—1 

W 



10 
8 



6 



2 








WLS 

cu 



2 3 
Multipole (1) 



Fig. 14. — Estimated risk from the mean and standard deviation of 10,000 bootstraps as a function 
of / moment for WLS (solid-black) and CU (dashed-red) using 112 SNIa. We find the minimum to 
be at i = 1 for both methods, suggesting that the data are inadequate for detecting the quadrupole. 
We expect the velocity fields derived from CU and WLS to be consistent as indicated by the similar 
risk values. 



two measurements to be compared and a\ and are the associated uncertainties. This statistic 
is suitable for comparing the CU or WLS bulk motion with the CMB dipole. We find the WLS 
Local Group bulk flow moving at 538 ± 86 km s"^ towards (/, h) = (258° ± 10°, 36° ± 11°) which is 
consistent with the magnitude of the CMB dipole (635 km s~^) and direction (269°, 28°) with an 
agreement of tdip = 1.12, ti = 1.1, and if, = 0.73. The CU bulk flow is moving at 446 it 101 km s~^ 
towards {l,b) = (273° ± 11°, 46° ± 8°). The CU bulk flow is in good agreement with the CMB dipole 
with tdip = 1-88, ti = 0.36, and h = 2.25. 

There is no strong evidence for a monopole component of the velocity field for either method. 
This merely demonstrates that we are u sing consistent values of My and Hq. For this analysis to 



be sensitive to a "Hubble bubble" (e.g., iJha et al.l 120071 ). we would look for a monopole signature 
as a function of redshift. 

Table 3. Paired t-test results 



«oo aio ^{aii) 9(aii) 



1.50 0.23 1.72 



1.66 
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We can directly compare our results to those obtained in lJha et alj (|2007l ) using a two-sample 
t-test as our analysis covers the same depth and is in the same reference frame. They find a velocity 
of 541 ± 75 km s'^ toward a directio n of = (258° ± 18°, 51° ± 12°). Our results for WLS 
and CU are compatible with iJha et al.l (1200711 wi t h t < 1 in magnitude and direction. We can also 
compare our results to those in iHaugbolle et al.l (|2007l ) for their 4500 sample transformed to the 
Local Group rest frame. They find a velocity of 516 km s~^ toward {l,b) = (248°, 51°). Their 
derived amplitude is slightly lower as their fit for the peculiar velocity field includes the quadrupole 
term. We note from the estimated risk curves, that it is not unreasonable to fit the quadrupole 
as the estimated risk is similar at / = 1 and 1 = 2. However, it is unclear if fitting the extra term 
improves the ac c uracy with which the field is modeled. Our results for CU and WLS agree with 



Haugb0lle et al.l (|2007l ) with t < 1. Note that these t values may be slightly underestimated as 



a subset of SNIa are common between the two analyzes. A summary of dipole measurements is 
presented in Table HI 



6. Dependence of CU and WLS on Bulk Flow Direction 

The WLS and CU analyses on real SNIa data give dipole directions that follow the well-sampled 
region. This may raise suspicion that the CU method is following the sampling when determining 
the dipole. In this section we examine the behavior of our methods on simulated data as we vary 
the direction of the dipole. 

We create simulated data sets from the sampling density derived from the actual data to verify 
the robustness of our analysis. We test two randomly chosen bulk flows which vary in magnitude 
and direction and sample 200 SNIa for each case. One dipole points toward a well-populated region 
of space and the other into a sparsely sampled region. A weak quadrupole is added such that the 
estimated risk gives a minimum at / = 1. There is power beyond the tuning parameter so we expect 
a bias to be introduced onto the coefficients for WLS. The velocity fields for the two cases are given 

by 

Case I : V = 400^01 + 5905Ryii + 8309^11 



Table 4. Summary of Dipole Results 



Method 


# 
SNIa 


Redshift Range 
CMB 


Depth 
km s~^ 


Magnitude 
km s~^ 


Direction 
Galactic {I, b) 


WLS 
CU 

Hauebolle et al. f2007) 
Jha et al. (2^7) 


112 
112 
74 
69 


0.0043-0.028 
0.0043-0.028 
0.0070-0.035 
0.0043-0.028 


4000 
4000 
4500 
3800 


538 ± 86 
446 ± 101 
516 ± 
541 ± 75 


(258°, 36°) ± (10°, 11°) 
(273°, 46°) ± (11°, 8°) 
(248°,51°)±(i5:,i|:) 

(258°, 51°) ± (18°, 12°) 
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-iooy2o + 2oo3?y2i + 2509y2i - i755}?y22 + 1403=^22 

Case 2 : V = -U2Yoi + -88^^11 + 8109^11 

-iooy2o + 2ooKy2i + 2509y2i - i75Ky22 + i403=y22 

For Case 1 the true dipole points along a sparsely sampled direction (Fig. [T6|) . In the top 
row are the simulated velocity field and the dipole component of that field. On the bottom are 
the results for CU and WLS. In all plots the true direction of the dipole is shown as a white 
circle. As WLS is optimized to model the velocity field, it is no surprise that WLS overestimates 
the magnitude of the dipole (bottom left) to better model the simulated velocity field (top left). 
This behavior is very similar to what we saw in ^ WLS is aliasing power onto different scales to 
best model the field, sacrificing unbiased coefficients. If we compare the CU velocity field to the 
simulated velocity field we see that it is less accurate but that CU's estimate of the dipole is a more 
accurate measure of the true dipole (top right). Both methods are recovering the direction of the 
dipole at roughly the 2a level, leading us to conclude that it is the magnitude of the dipole which 
is most variable between the methods for this case. 

One may more easily see the difference in WLS and CU determined coefficients in Figure [TT} 
where we plot the difference distributions (regression determined coefficients minus the true coeffi- 
cients) for 770 simulations of 200 data points. Ideally these distributions would be centered at zero 
with a narrow spread. The distance the mean of the distribution is from zero is an indication of 
the bias. The spread is an indication of the error. We see that WLS is more biased than CU but 
the uncertainty in CU is much larger. 

In Case 2 the true dipole points along a region of space which is densely sampled (Fig. [T8]) . 
In the bottom left plot we see the direction of the dipole for WLS is pulled down toward a region 
of space which is less sampled. Since the true direction of the dipole is well constrained by data, 
to more accurately model the flow field WLS must alter the direction of the dipole toward a less 
sampled region. This is necessary as WLS is trying to account for power which is really part of 
the quadrupole with the dipole term. As a result, WLS misses the true direction of the dipole at 
the 95% confidence level. This may be similar to what we see in Fig. [15] where the WLS dipole 
points more along the galactic plane when compared to CU. CU is less sensitive to this affect as it 
is optimized to find unbiased coefficients. Correspondingly, CU encloses the true direction of the 
dipole at the 95% confidence level. In Figure [19] we plot the difference distributions as we did for 
Case 1. It is clear that the WLS coefficients are more biased than CU but that the uncertainty in 
CU is much larger. 

We can explicitly check the bias of the methods using the simulated data of ^ The important 
calculation is the probability that the 95% confidence interval for a given simulation includes the 
true value. For an accurately determined confidence interval, this should happen 95% of the 
time. We start with one simulated dataset and perform 1000 bootstrap resamples. This gives 
us distributions of the coefficients from which we can determine the confidence intervals. We then 
determine if the true values falls within this interval. After doing this for all of the simulations from 
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^ we can measure how often the true value fahs within the confidence interval. These probabilities 
are summarized in Table [5j 

CU is more accurate in its estimate of the 95% confidence interval for both cases. The lower 
probabilities for WLS are a result of the bias in the method. By construction, the WLS confidence 
intervals are centered about the regression-determined coefficients. If the coefficients are biased, the 
WLS intervals are shifted and the true value will lie outside this interval more often than expected. 



7. Conclusion 

In this work, we applied statistically rigorous methods of nonparametric risk estimation to the 
problem of inferring the local peculiar velocity field from nearby SNIa. We use two nonparametric 
methods - WLS and CU - both of which employ spherical harmonics to model the field and use the 
risk to determine at which multipole to truncate the series. The minimum of the estimated risk 
will tell one the maximum multipole to use in order to achieve the best combination of variance 
and bias. The risk also conveys which method models the data most accurately. 

WLS estimates the coefficients of the spherical harmonics via weighted least squares. We 
show that if the data are not drawn from a uniform distribution and if there is power beyond 
the maximum multipole in the regression, WLS fitting introduces a bias on the coefficients. CU 
estimates the coefficients without this bias, thereby modeling the field over the entire sky more 
realistically but sacrificing in accuracy. Therefore, if one believes there is power beyond the tuning 
parameter or the data are not uniform, CU may be more appropriate when estimating the dipole, 
but WLS may describe the data more accurately. 

After applying nonparametric risk estimation to SNIa we find that there is not enough data at 
this time to measure power beyond the dipole. There is also no significant evidence of a monopole 
term for either WLS or CU, indicating that we are using consistent values of Hq and My. The 
WLS Local Group bulk flow is moving at 538 ± 86 km s'^ towards (/, b) = (258° ± 10°, 36° ± 11°) 

Table 5. Probability of the 95% confidence interval containing the truth 





«oo 


aio 


3^(aii) 


9(aii) 


Case 1 










WLS 


0.50 


0.88 


0.86 


0.88 


CU 


0.93 


0.90 


0.89 


0.92 


Case 2 










WLS 


0.49 


0.88 


0.86 


0.88 


CU 


0.93 


0.94 


0.94 


0.91 



and the CU bulk flow is moving at 446 ± 101 km s"^ towards {I, b) = (273° ± 11°, 46° ± 8°). After 
performing a paired t-test we find that these values are in agreement. 

To test how CU and WLS perform on a more realistic dataset, we simulate data similar to the 
actual data and investigate how they perform as we change the direction of the dipole. We find for 
our two test cases, that CU produces less biased coefficients than WLS but that the uncertainties 
are larger for CU. We also find that the 95% confidence intervals detemined by CU are more 
representative of the actual 95% confidence intervals. 

We estimate using simulations that with ~200 data points, roughly double the current sample, 
we would be able to measure the quadrupole moment assuming a similarly distributed dataset. 
Nearby SNIa programs such as the CfA Supernova Group, Carnegie Supernova Project, KAIT, 
and the Nearby SN Factory will easily achieve this sample size in the next one to two years. 
The best way to constrain higher-or der moments however, would be to obtain a nearly uniform 



distribution of data points on the sky. lHaugb0lle et al.l (|2007l ) estimate that with a uniform sample 
of 95 SNIa we can probe I = 3 robustly. 

With future amounts of data the analysis can be expanded not only out to higher multipoles, 
but to modeling the peculiar velocity field as a function of redshift. This will enable us to determine 
the redshift at which the bulk flow converges to the rest frame of the CMB. Binning the data in 
redshift will also allow one to look for a monopole term that would indicate a Hubble bubble. 

As there is no physical motivation for using spherical harmonics to model the sampling density, 
future increased amounts of data will also allow us to use nonparametric kernel smoothing both to 
estimate h and the peculiar velocity field; this would be ideal for distributions on . the sky which 



subte nd a small angle, like the SDSS-II Supernova Survey sample (jSako et al.ll2008l : iFrieman et al 



20081) 



As astronomical data sets continue to grow, it becomes increasingly important to pursue sta- 
tistical methods like those outlined in this work. 
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A. Appendix 

A.l. Bias on WLS coefficients /3j 

To determine the bias on the estimated coefficients, recall that we can model any velocity 
field with an infinite set of spherical harmonics 



U = Y(3 + e 

where f3 is the column vector given by /3 = {Po-.-Poo) and 



Y 



(t>i){xi) (t)i{xi) 



4>oo{xi) 
4>oo{x2) 



(t>o{xN) (t>l{xN) ••• (l^ooixN) 

If we substitute U into Eq. [H] we get 

h = {YjWYj)-^ Yj W (y/3 + e) 

If we decompose Y (3 into 



where /3oo = {Pj+i-.-PooV and 



then 



Y/3 = Yj/3j + yoo/3 o 



(j)j+l{xi) (j)j+2ixi) 



0oo(a:i) 

4'oo{x2) 



The bias on /3j is 



(3j-(f3j 



(t>J+l{xN) 4'J+2{xn) ••• 4>oo{xn) 



{YjWYj)-^ Yj W + yoo/3oo + e) 

/3j + {YjWYjr^ Yj W {Y^(3oo + e) 



f3j - {/3j + {YjWYj)-^ Yj W (yoo/3oo + e)) 
{{Y^WYy^Y^ W {Y^Poo)) . 



(Al) 



(A2) 



(A3) 
(A4) 



(A5) 



(A6) 
(A7) 



(A8) 
(A9) 
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A. 2. Bias on CU Coefficients /?* 

To determine the bias for the weighted coefficients /3* we first multiply the top and bottom of 
Eq.EUby l/N 

j_ Un(l)j{Xn) 
N 

^7 = ^"^hr—^ (Aio) 



n=l " 



Un<j>j{Xn) 



Xn), and h(^Xn) are all independent of Cn, then 

' Un(t>j{Xn) ' 



Un<Pj(x„) \ / J_ 
1 



So our bias is /3 — (/?*) = 0. 



(All) 



= ^ ^ (A12) 



(A13) 



(A15) 



A. 3. Risk Estimation 

The risk is a way of determining how many basis functions should be in f{x) and can be 
written as 

R = ((e-ef'^ (A16) 

where 9 is the estimated or measured value of some true parameter, 6. The expectation value of 
is the mean, 9 

e = (9) (A17) 

By adding and subtracting the mean from (9 — 9) in Eq. IA161 the risk can be written in terms of 
the variance and the bias. 

R = l^{e - 9 + 9 - 9 f'^ (A18) 
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= {{e-ef) + {e-ef + {e-e){(e-e)) (ai9) 

= {0 - ef) + - (A20) 

= Var(^)+bias2 (A21) 

A. 4. Smoothing Matrix for CU Regression 



3=0 

N 



E 



Un<f>jiXn) 



3=0 



N 

E 

n=l 



(t>j{Xi)(j)j{Xn) 



E 



AT 



n=l 



X] 0-2 

n=l 



(A22) 



(A23) 



(A24) 
(A25) 
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Fig. 15. — Peculiar velocity field for WLS (top) and CU (bottom) using the tuning parameter 
J = 1. The solid white circle marks the direction of the regression dipole, the white asterisk marks 
the CMB dipole in the rest frame of the Local Group, and the black triangles and squares mark the 
data points with positive and negative peculiar velocities. Contours are given to mark the 65% and 
95% confidence bands of the direction of the dipole. The color scale indicates the peculiar velocity 
in km s~^. We see that WLS and CU are in 95% agreement with the direction of the CMB dipole. 
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Simulated Velocity Field Simulated Dipole Component 




Fig. 16. — Simulated velocity field for Case 1 (top left), dipole component of that field (top right), 
WLS dipole result (bottom left) and CU dipole result (bottom right) for a typical simulation of 200 
data points. Data points are overlaid as triangles (positive peculiar velocity) and squares (negative 
peculiar velocity). Error contours (68% and 95%) are marked as black lines. The 95% contour for 
CU and WLS enclose the direction of the true dipole, marked as a white circle. The WLS result is 
more representative of the actual field while CU has a more accurate dipole. 



- 51 - 




-600 -400 



-200 200 
aOO Residuals 



400 



600 



>. 100, 
^ 80^ 




-600 -400 



-200 200 
alO Residuals 



400 



600 



100 




-600 -400 



-200 200 
Re(all) Residuals 



400 



600 




-600 -400 



-200 200 
Im(all) Residuals 



400 



600 



Fig. 17. — Distribution of CU (WLS) coefficients minus the true values for Case 1 in red (blue) 
for 770 simulated datasets. The vertical lines indicate the mean of the distribution. The distance 
the mean is from zero is an indication of the bias. The spread in the distributions indicates the 
uncertainty. WLS is more biased than CU but CU has larger uncertainties. 
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Fig. 18. — Simulated velocity field for Case 2 (top left), dipole component of that field (top right), 
WLS dipole result (bottom left) and CU dipole result (bottom right) for a typical simulation. Data 
points are overlaid as triangles (positive peculiar velocity) and squares (negative peculiar velocity). 
In this scenario, the 95% contour for WLS, marked in black, completely misses the direction of the 
true dipole, marked as the white circle. The WLS dipole is pulled toward a region of space less 
sampled. 
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Fig. 19. — Distribution of CU (WLS) coefficients minus the true values for Case 2 in red (blue) for 
874 simulated datasets. The vertical lines indicate the mean of the distribution. WLS is consistently 
more biased than CU 



